Can Image Retrieval help Visual Saliency Detection?
نویسندگان
چکیده
We propose a novel image retrieval framework for visual saliency detection using information about salient objects contained within bounding box annotations for similar images. For each test image, we train a customized SVM from similar example images to predict the saliency values of its object proposals and generate an external saliency map (ES) by aggregating the regional scores. To overcome limitations caused by the size of the training dataset, we also propose an internal optimization module which computes an internal saliency map (IS) by measuring the low-level contrast information of the test image. The two maps, ES and IS, have complementary properties so we take a weighted combination to further improve the detection performance. Experimental results on several challenging datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods 1. Visual saliency detection aims to find the most distinctive or important regions in an image and often serves as a preprocessing step for many computer vision tasks. Although numerous models and algorithms have been proposed in recent years, it remains a challenging problem to find salient regions with great accuracy. Existing methods perform saliency detection either solely based on the test image itself [1, 14, 16, 33] or train parameters from a large dataset [18, 19, 24, 25]. The former types of approaches typically focus on low-level contrast properties so are limited when the salient object is similar in color to the background. On the other hand, the latter supervised approaches train a generally suitable model for all test samples, which may not be the optimal solution for each specific image. Therefore, a good training set should be customized to the individual image. The human vision system is sensitive to intensive visual stimuli, such as color, texture and orientation. However, the human ability to separate salient objects from chaotic backgrounds relies on prior knowledge accumulated by years of learning. Taking the test image in Figure 2 for example, the salient ship and background buildings have similar appear1This paper was an undergraduate final year project finished by Shuang Li at Dalian University of Technology in 2015 and was later revised by Peter Mathews. (a) (b) (c) (d) (e) Figure 1. Saliency maps generated by the proposed method: (a) Test image. (b) Ground truth. (c) Internal saliency map. (d) External saliency map. (e) Integrated result. ances, but we can immediately localize the ship because a general awareness has been formed in our mind after seeing hundreds of ship models. Similar examples can enhance our contextual knowledge and provide valuable prior information to the test image. We address the saliency detection problem from a different perspective than previous works. Motivated by the fact that similar images with pre-stored bounding box annotations contain important cues of shapes, positions, and colors of the target objects, we design an image retrieval approach that searches for similar example images from a large dataset and transfers saliency information from the examples to the test image. Compared with individual imagebased approaches, most of which focus on the feature contrast within the test image, our method utilizes more accurate object information and is more robust to complex backgrounds. Furthermore, instead of using the whole dataset as training samples and treating each individual image equally, we select a subset of similar examples to train a customized SVM for each test image on-the-fly. However, due to the finite size of the annotation dataset, some test images with rare contents may not find sufficiently similar examples. Therefore, we also propose an internal optimization module based on low-level contrast to assist the image retrieval and take a weighted sum to construct the final saliency map ar X iv :1 70 9. 08 17 2v 1 [ cs .C V ] 2 4 Se p 20 17 Internal optimization Test image Image retrieval Superpixel segments Object proposals Internal saliency map External saliency map Final saliency map SVM Figure 2. Pipeline of proposed algorithm. (EIS). The main contributions of our work can be summarized as follows: • We propose a novel image retrieval framework, which addresses saliency detection from a new perspective by transferring high-level object information from similar examples to the test image. • We introduce an effective internal optimization module which explores the discriminability and similarity between each pair of superpixels within the test image and serves as an essential supplement to the image retrieval. The pipeline of proposed method is shown in Figure 2. We first search for similar examples from a large dataset and train a customized SVM classifier for each test image. Then we generate an internal saliency map by solving a joint optimization problem. We pick out the most salient object proposals in the test image based on the internal saliency map and predict their saliency values using an SVM classifier. By computing the sum of saliency values, an external saliency map is constructed. We further fuse it with the internal saliency map to generate the final saliency map. Extensive experiments on four benchmark datasets demonstrate that the proposed algorithm outperforms most of the state-of-the-art saliency detection methods. Several example results are shown in Figure 1.
منابع مشابه
Compressed-Sampling-Based Image Saliency Detection in the Wavelet Domain
When watching natural scenes, an overwhelming amount of information is delivered to the Human Visual System (HVS). The optic nerve is estimated to receive around 108 bits of information a second. This large amount of information can’t be processed right away through our neural system. Visual attention mechanism enables HVS to spend neural resources efficiently, only on the selected parts of the...
متن کاملReduced-Reference Image Quality Assessment based on saliency region extraction
In this paper, a novel saliency theory based RR-IQA metric is introduced. As the human visual system is sensitive to the salient region, evaluating the image quality based on the salient region could increase the accuracy of the algorithm. In order to extract the salient regions, we use blob decomposition (BD) tool as a texture component descriptor. A new method for blob decomposition is propos...
متن کاملMultiple Structure Based Saliency Detection and Its Application in Image Retrieval
Saliency Detection is a hot research topic in both biological and computer vision. Salient structures, edges, regions would greatly contribute to high-level semantics understanding of people’s attention and improve retrieval precision, object detection, edge detection and etc. In this paper, based on the biological principle in visual system, we present a saliency detection system which combine...
متن کاملUsability of Cluster Based Co-Saliency in Video Foreground Detection
Ability of human visual system to detect prominent regions in an image is fast, reliable and efficient. Computational modeling of this extraordinary behavior is termed as saliency detection. Saliency detection identifies salient regions in an image and is relevant in many computer vision applications such as object recognition, image segmentation, image retrieval etc. The term co-saliency can b...
متن کاملA Saliency Detection Model via Fusing Extracted Low-level and High-level Features from an Image
Saliency regions attract more human’s attention than other regions in an image. Low- level and high-level features are utilized in saliency region detection. Low-level features contain primitive information such as color or texture while high-level features usually consider visual systems. Recently, some salient region detection methods have been proposed based on only low-level features or hig...
متن کاملAnt-Inspired Visual Saliency Detection in Image
Visual saliency detection has been of great research interest in recent years, since it is potential for a wide range of applications, such as object detection, content-based image retrieval and perceptual image compression. Human perceptual attention usually tends to firstly pick attended regions, which correspond to prominent objects in an image, rather than the whole image (Jams, 1890; Itti,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.08172 شماره
صفحات -
تاریخ انتشار 2017